Reproducible Data Reports with Quarto

Myles Mitchell @ Jumping Rivers

Welcome!

https://rss-quarto.jumpingrivers.training/welcome/

Password: avocado-lime

Screenshot of log in page

Before we start…

Who am I?

  • Background in Astrophysics.

  • Data Scientist @ Jumping Rivers:

    • Python support for various clients.

    • Teach courses in Python, SQL and ML.

  • Hobbies include hiking and travelling.

Jumping Rivers

↗   jumpingrivers.com     𝕏   @jumping_uk

  • Machine learning
  • Dashboard development
  • R packages
  • APIs
  • Data pipelines
  • Code review
           

Introduction to Quarto

What is Quarto?

Quarto is a tool created by Posit (formerly RStudio).

Workflow diagram starting with a qmd file, then Jupyter, then md, then pandoc, then PDF, MS Word, or HTML.
  • “Next-gen” R Markdown: writing documents that combine code with text.
  • Tables and plots can be quickly regenerated if the data changes.
  • Works for multiple languages and output formats (see the Quarto gallery).
  • These HTML slides were built in Quarto!
  • Quarto files have .qmd extension.
  • Download it from quarto.org/docs/get-started.

Compatible with multiple IDEs

  • RStudio IDE1
  • VS Code2
  • Jupyter
  • Text Editor

What about R Markdown?

R Markdown isn’t going anywhere, but…

  • Quarto has better multi-language support.

       

  • More user-friendly.

  • Better control of the output layouts.

  • There is an R package library(quarto).

  • New features will be added to Quarto.

Creating a new Quarto document

  • File > New File > Quarto Document

  • Set title and author

  • Click Create

  • Save and click Render

YAML header

YAML: Yet Another Markup Language

---
title: "A very cool title"
author: "Myles Mitchell"
format: html
---
  • Set title, subtitle, date, author, etc.
  • Set the output format.
  • Any formatting that should be applied to the entire document.
  • Import pre-defined styles using Quarto extensions.

Document content

  • Text
  • Links
  • Images
  • Code
  • Embedded tables and plots
  • References

Then click the Render button.

Inserting text - Markdown

  • Fonts

    **bold**
    *italic*
  • Headings

    # Heading level 1
    ## Heading level 2
    ### Heading level 3
    #### Heading level 4
    ##### Heading level 5
    ###### Heading level 6
  • Bullet points (use -, + or *)

    - Banana
    - Pear
    - Apple
  • Numbered lists (automatic indexing!)

    1. Royal Gala
    1. Pink Lady
    1. Granny Smith
  • Nested lists

    - Banana
    - Pear
    - Apple
      1. Royal Gala
      1. Pink Lady
      1. Granny Smith

Including images

  • Use the Visual editor - click Insert –> Figure/Image

  • Or type into a Quarto document:

    ![](img/lemur.jpg){fig-alt="Brown-collared lemur looking directly at the camera"}
Brown-collared lemur looking directly at the camera

Task 1: Your first Quarto document

  • File > New File > Quarto Document.

  • Set title and author.

  • Click Create.

  • Save and click Render.

  • Add the text from task1.txt.

  • Add a link to the Duke Lemur Center https://lemur.duke.edu/.

  • Add the image of the Mongoose Lemur.

15:00

Code Chunks

… and lemurs…

Loading data

You can load in your data using R:

```{r}
#| message: false
library(readr)
lemurs <- read_csv(
  "https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-08-24/lemur_data.csv",
  col_types = cols_only(sex = "c", age_at_wt_mo = "d", weight_g = "d"))
```

Chunk Options

  • Show the code:

    #| echo: true
  • Hide the code:

    #| echo: false
  • Show the code and the YAML:

    #| echo: fenced
  • Don’t evaluate the code:

    #| eval: false
  • Hide code messages and warnings:

    #| message: false
  • Can set default code behaviour in the YAML header:

    ---
    title: "My Document"
    author: "Myles Mitchell"
    format: html
    execute:
      message: false
    ---

Code collapsing

If we’re just loading in packages, perhaps we should collapse the code…

Code
```{r}
#| message: false
#| code-fold: true
library(dplyr)
library(tidyr)
library(ggplot2)
```

… do some data wrangling, and print the output as a table…

```{r}
#| output-location: slide
df = lemurs %>% 
  drop_na() %>% 
  filter(age_at_wt_mo > 12)

df %>% 
  head() %>% 
  knitr::kable()
```
sex weight_g age_at_wt_mo
M 1086 125.82
M 1190 129.93
F 947 131.11
F 1174 135.42
M 899 100.64
M 917 101.06

…or include some exploratory plots!

```{r}
#| message: false
#| output-location: slide
#| fig-cap: "Age vs weight of lemurs"
#| fig-alt: "Scatter plot showing positive relationship between lemur age and weight split by sex"
#| out-width: 150%
ggplot(data = df,
       mapping = aes(x = age_at_wt_mo,
                     y = weight_g,
                     colour = sex)) +
  geom_point(alpha = 0.1) +
  geom_smooth(method = "lm") +
  labs(title = "Weight of lemurs",
       x = "Age (months)",
       y = "Weight (g)")
```

Scatter plot showing positive relationship between lemur age and weight split by sex

Age vs weight of lemurs

Inline code

```{r}
num_obs = nrow(lemurs)
```

We can also include code inline, rather than as a separate chunk.

The number of observations is `r num_obs`.

The number of observations is 82609.

Task 2: Adding code

  • Add a code block to your document to load the data and some libraries. Use the code in task-2.R.

  • Set the code chunk options to hide the code and messages.

  • Add a second code block to make a scatter plot (use the code given or make your own).

  • Add a caption to the plot with the fig-cap code chunk option.

  • Add a final code chunk to find the average weight, and show the code.

15:00

Not Convinced?

Let’s explore some more use cases!

Polished presentations

  • Great way to demonstrate code:

    ggplot(data = df,
           mapping = aes(x = age_at_wt_mo,
                         y = weight_g,
                         colour = sex)) +
      geom_point(alpha = 0.1) +
      geom_smooth(method = "lm") +
      labs(title = "Weight of lemurs",
           x = "Age (months)",
           y = "Weight (g)")
  • Engaging format for teaching!

  • Easy to add elements like progress bar, slide numbering, etc.

  • Nice animations.

  • Flexible CSS styling that can be shared.

Journals

  • LaTeX backend for PDF documents.

  • Correctly formatted journal articles in 30 seconds? Try quarto-journals.

  • E.g. for the Journal of Statistical Software article:

    ----
    title: "My Document"
    format:
      pdf: default
      jss-pdf:
        keep-tex: true
    ---
  • More journal templates: github.com/mcanouil/awesome-quarto#journals.

Parameterised reports

  • Add parameters to your document using the YAML header:

    ---
    title: "My Report About Penguins"
    format: html
    params:
      species: "Adelie"
    ---
  • Access these parameters with params$species

    penguins_subset = dplyr::filter(penguins, species == params$species)
  • Can update from the command line without the need for manual changes!

    quarto render slides.qmd -P species:"Gentoo"

Multi-language support

  • Everything we’ve shown here can also be done in Python!

  • Fence code cells with {python} instead of {r}.

  • VS Code and Jupyter Lab are both supported by Quarto.

  • Inline code and parameterised reporting is handled differently in Python.

  • See Python Quarto docs for details.

Publishing

It’s easy to share your documents!

  • Quarto Pub is free to use and specifically designed for Quarto content.

  • GitHub pages (like this presentation).

  • Posit Connect

  • Netlify

  • Top tip: creating an HTML with no external dependencies

    ---
    title: "Report with embedded images"
    format:
      html:
        embed-resources: true
    ---

Thanks for listening!